Privacy-preserving heterogeneous health data sharing
نویسندگان
چکیده
OBJECTIVE Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data. METHODS The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy. RESULTS We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis. LIMITATION The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases. CONCLUSIONS Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.
منابع مشابه
Framework Design and Case Study for Privacy-Preserving Medical Data Publishing
With the pervasive using of Electronic Medical Records (EMR) and telemedicine technologies, more and more digital healthcare data are accumulated from multiple sources. As healthcare data is valuable for both commercial and scientific research, the demand of sharing healthcare data has been growing rapidly. Nevertheless, health care data normally contains a large amount of personal information,...
متن کاملA Framework for Privacy-Preserving Medical Document Sharing
Health information systems have greatly increased availability of medical documents and benefited healthcare management and research. However, there are growing concerns about privacy in sharing medical documents. Existing approaches for privacypreserving data sharing deal mostly with structured data. Current privacy techniques for unstructured medical text focus on detection and removal of pat...
متن کاملPrivacy-preserving Sanitization in Data Sharing
PRIVACY-PRESERVING SANITIZATION IN DATA SHARING
متن کاملPreserving privacy in shared provenance data
Provenance management still lacks robust models for sharing provenance data between multiple parties while keeping parts of it private to the owner. This limits the potential for provenance dissemination, which is a critical step in enabling data sharing amongst partners with limited a priori mutual trust. In turn, this has a negative impact on data-intensive science and its associated research...
متن کاملارایه یک روش جدید انتشار دادهها با حفظ محرمانگی با هدف بهبود دقّت طبقهبندی روی دادههای گمنام
Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from thes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of the American Medical Informatics Association : JAMIA
دوره 20 3 شماره
صفحات -
تاریخ انتشار 2013